Skip to content

charts/redpanda: Gateway API TLSRoute support for external access#1447

Draft
david-yu wants to merge 13 commits intomainfrom
feat/gateway-api-tlsroute
Draft

charts/redpanda: Gateway API TLSRoute support for external access#1447
david-yu wants to merge 13 commits intomainfrom
feat/gateway-api-tlsroute

Conversation

@david-yu
Copy link
Copy Markdown
Contributor

@david-yu david-yu commented Apr 15, 2026

Summary

  • Adds Gateway API TLSRoute-based external access as an alternative to NodePort/LoadBalancer
  • Per-listener gateway: true opt-in enables gradual migration — traditional and TLSRoute listeners coexist
  • Creates bootstrap and per-broker ClusterIP services as TLSRoute backends
  • Per-listener host/hostTemplate fields enable unique SNI hostnames per external listener, solving the per-listener domain problem (Different domain per listener #1361)
  • User manages the Gateway externally; the chart only creates TLSRoute resources referencing it via parentRefs

Design

The design follows the established pattern for TLSRoute-based access using Gateway API:

  1. User brings their own Gateway — the operator/chart only manages TLSRoute resources and ClusterIP services
  2. Per-listener opt-in — each external listener independently chooses gateway mode via gateway: true, enabling gradual migration
  3. SNI-based routing — each broker gets a unique hostname, allowing the Gateway to route TLS traffic by SNI to the correct per-broker service
  4. Bootstrap + per-broker architecture — a bootstrap TLSRoute handles initial client connections; per-broker TLSRoutes handle direct broker connections after metadata discovery

Using TLS Passthrough (recommended)

In passthrough mode, the Gateway forwards the TLS connection as-is to the Redpanda broker. Redpanda's own TLS certificate is used, and mTLS authentication works.

Client ──[TLS]──▶ Gateway ──[TLS passthrough]──▶ Redpanda broker

Step 1: Create a Gateway with TLS passthrough

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: redpanda-gateway
spec:
  gatewayClassName: envoy   # or any TLSRoute-capable implementation
  listeners:
    - name: kafka
      protocol: TLS
      port: 9094
      tls:
        mode: Passthrough    # forward TLS as-is to Redpanda

Step 2: Configure Redpanda with gateway listeners

external:
  enabled: true
  gateway:
    enabled: true
    parentRefs:
      - name: redpanda-gateway
        sectionName: kafka
    advertisedPort: 9094
tls:
  enabled: true
  certs:
    default:
      caEnabled: true
listeners:
  kafka:
    external:
      default:
        port: 9094
        gateway: true                                  # opt-in to TLSRoute mode
        host: redpanda.example.com                     # bootstrap hostname
        hostTemplate: redpanda-$POD_ORDINAL-broker.example.com  # per-broker hostname
        tls:
          enabled: true
          cert: default

Step 3: Configure DNS

Point these DNS records to the Gateway's external IP:

  • redpanda.example.com → Gateway IP (bootstrap)
  • redpanda-0-broker.example.com → Gateway IP (broker 0)
  • redpanda-1-broker.example.com → Gateway IP (broker 1)
  • redpanda-2-broker.example.com → Gateway IP (broker 2)

Step 4: Connect clients

rpk topic list \
  --brokers redpanda.example.com:9094 \
  --tls-enabled \
  --tls-truststore ca.crt

Using TLS Termination

In termination mode, the Gateway decrypts TLS and forwards plaintext to the broker. The Gateway's own certificate is presented to clients. mTLS authentication is not available in this mode.

Note: TLS termination for TLSRoutes is not yet supported by most Gateway implementations. Passthrough is the practical choice today.

Client ──[TLS]──▶ Gateway ──[plaintext]──▶ Redpanda broker

Step 1: Create a Gateway with TLS termination

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: redpanda-gateway
spec:
  gatewayClassName: envoy
  listeners:
    - name: kafka
      protocol: TLS
      port: 9094
      tls:
        mode: Terminate       # Gateway decrypts TLS
        certificateRefs:
          - name: gateway-tls-cert

Step 2: Configure Redpanda without TLS on the external listener

Since the Gateway handles TLS, the Redpanda listener receives plaintext:

external:
  enabled: true
  gateway:
    enabled: true
    parentRefs:
      - name: redpanda-gateway
        sectionName: kafka
    advertisedPort: 9094
listeners:
  kafka:
    external:
      default:
        port: 9094
        gateway: true
        host: redpanda.example.com
        hostTemplate: redpanda-$POD_ORDINAL-broker.example.com
        # No TLS config — connection arrives decrypted from the Gateway

Migrating from Traditional Listeners to Gateway API

The per-listener gateway: true field enables zero-downtime migration. Traditional NodePort/LoadBalancer listeners and TLSRoute listeners coexist on different ports.

Step 1: Deploy the Gateway

Create the Gateway resource in your cluster:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: redpanda-gateway
spec:
  gatewayClassName: envoy
  listeners:
    - name: kafka
      protocol: TLS
      port: 9094
      tls:
        mode: Passthrough

Step 2: Add a TLSRoute listener alongside the existing one

Update your Helm values to add a new listener with gateway: true. The existing NodePort listener continues to work:

external:
  enabled: true
  type: NodePort              # existing NodePort config unchanged
  gateway:
    enabled: true
    parentRefs:
      - name: redpanda-gateway
        sectionName: kafka
    advertisedPort: 9094
tls:
  enabled: true
  certs:
    default:
      caEnabled: true
listeners:
  kafka:
    external:
      default:                 # existing NodePort listener — unchanged
        port: 9094
        advertisedPorts:
          - 30092
      gw-listener:             # new TLSRoute listener on a different port
        port: 9095
        gateway: true
        host: redpanda.example.com
        hostTemplate: redpanda-$POD_ORDINAL-broker.example.com
        tls:
          cert: default

After helm upgrade, the cluster has both:

  • NodePort service with port 9094 (for existing clients)
  • Gateway ClusterIP services + TLSRoutes with port 9095 (for new clients)

Step 3: Configure DNS and migrate clients

  1. Set up DNS records pointing to the Gateway IP
  2. Migrate clients one at a time to the new bootstrap address (redpanda.example.com:9094)
  3. Monitor that clients are connecting through the Gateway

Step 4: Remove the old listener

Once all clients have migrated, remove the NodePort listener:

listeners:
  kafka:
    external:
      # default: removed
      gw-listener:
        port: 9095
        gateway: true
        host: redpanda.example.com
        hostTemplate: redpanda-$POD_ORDINAL-broker.example.com
        tls:
          cert: default

Optionally remove external.type: NodePort as it is no longer used.

Verified end-to-end with Envoy Gateway + rpk

Reproducible recipe used to validate this PR on a local k3d cluster, producing and consuming over TLS through the Gateway. Run from the feat/gateway-api-tlsroute branch.

1. Cluster + dependencies

# Cluster (any K8s 1.30+ should work)
k3d cluster create rp-gw-test --image rancher/k3s:v1.32.13-k3s1

# Gateway API CRDs — TLSRoute is in the experimental channel, you must use experimental-install.yaml
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.1/experimental-install.yaml

# Envoy Gateway (any TLSRoute-capable Gateway implementation works; Cilium / Istio also fine)
helm install eg oci://docker.io/envoyproxy/gateway-helm \
  --version v1.2.6 \
  --namespace envoy-gateway-system --create-namespace \
  --wait

# Envoy Gateway does not auto-create a default GatewayClass; create one
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
EOF

# cert-manager (for the chart's self-signed CA)
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager --create-namespace \
  --version v1.17.2 --set crds.enabled=true --wait

2. Create a Gateway (user-managed, the chart only attaches TLSRoutes)

kubectl create namespace rp-gw
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: redpanda-gateway
  namespace: rp-gw
spec:
  gatewayClassName: eg
  listeners:
    - name: kafka
      protocol: TLS
      port: 9094
      tls:
        mode: Passthrough
      allowedRoutes:
        kinds:
          - kind: TLSRoute
            group: gateway.networking.k8s.io
EOF

3. Install the Redpanda chart with a gateway-mode listener

values.yaml:

statefulset:
  replicas: 1
storage:
  persistentVolume:
    enabled: false
external:
  enabled: true
  domain: test.local                       # appended to the cert SANs as *.test.local
  gateway:
    enabled: true
    parentRefs:
      - name: redpanda-gateway
        sectionName: kafka
    advertisedPort: 9094                   # port advertised in broker metadata
tls:
  enabled: true
  certs:
    default:
      caEnabled: true
listeners:
  kafka:
    external:
      default:
        port: 9094
        gateway: true                      # opt this listener into TLSRoute mode
        host: redpanda.test.local          # bootstrap SNI hostname
        hostTemplate: redpanda-$POD_ORDINAL.test.local   # per-broker SNI hostname
        tls:
          enabled: true
          cert: default
helm install rp ./charts/redpanda/chart -n rp-gw -f values.yaml --wait

What the chart renders:

Resource Name Notes
TLSRoute rp-kafka-default-bootstrap hostname redpanda.test.localService/rp-gateway-bootstrap:9094
TLSRoute rp-kafka-default-0 hostname redpanda-0.test.localService/gw-rp-0:9094
Service (ClusterIP) rp-gateway-bootstrap LB to all brokers on port 9094
Service (ClusterIP) gw-rp-0 per-broker, selects pod rp-0
Certificate rp-default-cert SANs include *.test.local and test.local, covering both hostnames

Verify the routes are attached:

$ kubectl -n rp-gw get tlsroute
NAME                         AGE
rp-kafka-default-0           1m
rp-kafka-default-bootstrap   1m

$ kubectl -n rp-gw get gateway redpanda-gateway -o jsonpath='{.status.listeners[*].attachedRoutes}'
2

$ kubectl -n rp-gw get tlsroute -o jsonpath='{.items[*].status.parents[*].conditions[?(@.type=="Accepted")].status}'
True True

4. Connect with rpk over TLS

For local testing without DNS, run rpk in a container on the same docker network as the cluster, mapping the SNI hostnames to the Envoy data-plane IP via --add-host:

# Extract the chart-issued CA
kubectl -n rp-gw get secret rp-default-root-certificate -o jsonpath='{.data.ca\.crt}' | base64 -d > ca.crt

# Find the Envoy data-plane LoadBalancer IP
GW_IP=$(kubectl -n envoy-gateway-system get svc \
  -l gateway.envoyproxy.io/owning-gateway-name=redpanda-gateway \
  -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}')

# Wrapper that runs rpk inside the docker network, with hostname resolution + CA mounted
RPK="docker run --rm -i --network k3d-rp-gw-test \
  --add-host redpanda.test.local:$GW_IP \
  --add-host redpanda-0.test.local:$GW_IP \
  -v $PWD/ca.crt:/etc/rp/ca.crt:ro \
  --entrypoint rpk docker.redpanda.com/redpandadata/redpanda:v25.2.5 \
  -X brokers=redpanda.test.local:9094 \
  -X tls.enabled=true -X tls.ca=/etc/rp/ca.crt"

$RPK cluster info
$RPK topic create gateway-test -p 3 -r 1
printf 'key-1\thello' | $RPK topic produce gateway-test --format '%k\t%v'
$RPK topic consume gateway-test -n 1 -o :end -f '%p:%o key=%k value=%v\n'

Result on the validation run:

$ rpk cluster info
ID    HOST                   PORT
0*    redpanda-0.test.local  9094

$ rpk topic create gateway-test -p 3 -r 1
TOPIC         STATUS
gateway-test  OK

$ rpk topic produce gateway-test            # 5 records, partitions 0 + 2
Produced to partition 0 at offset 0 …
Produced to partition 2 at offset 0 …
…

$ rpk topic consume gateway-test -n 5 -o :end
0:0 key=key-1 value=Hello via TLSRoute msg #1 …
0:1 key=key-4 value=Hello via TLSRoute msg #4 …
2:0 key=key-2 value=Hello via TLSRoute msg #2 …
2:1 key=key-3 value=Hello via TLSRoute msg #3 …
2:2 key=key-5 value=Hello via TLSRoute msg #5 …

SNI-based routing confirmed in the Envoy data-plane access log:

requested_server_name upstream_cluster
redpanda.test.local tlsroute/rp-gw/rp-kafka-default-bootstrap/rule/-1
redpanda-0.test.local tlsroute/rp-gw/rp-kafka-default-0/rule/-1

So the design works end-to-end: bootstrap connection → bootstrap TLSRoute → bootstrap service; then the broker advertises redpanda-0.test.local:9094; the client SNI-reconnects; Envoy routes by SNI to the per-broker TLSRoute → per-broker ClusterIP → broker pod, all under TLS Passthrough using Redpanda's own cert (no Gateway-side cert needed).

Files changed

File Change
charts/redpanda/values.go GatewayConfig, GatewayParentRef types; per-listener Gateway/Host/HostTemplate fields; ServicePorts() filters gateway listeners
charts/redpanda/service.gateway.go Bootstrap + per-broker ClusterIP service generation (gateway-opted listeners only)
charts/redpanda/tlsroute.go Bootstrap + per-broker TLSRoute generation with SNI hostnames
charts/redpanda/chart.go Register TLSRoute type and wire into render pipeline
charts/redpanda/secrets.go Per-listener gateway-aware advertised address construction
charts/redpanda/service.{loadbalancer,nodeport}.go Skip gateway-opted listeners in port generation
operator/api/.../redpanda_clusterspec_types.go CRD types for gateway config
operator/.../redpanda_controller.go RBAC for gateway.networking.k8s.io/tlsroutes

Out of scope (future work)

  • TCPRoute support (for non-TLS listeners)
  • Gateway resource management by the operator
  • East-west (service mesh) traffic routing
  • TLSRoute status checking (wait for gateway acceptance)

Test plan

  • Verify chart compiles: go build ./charts/redpanda/...
  • Verify operator compiles: go build ./operator/...
  • Golden test: gateway-api-tlsroute — all listeners on TLSRoute
  • Golden test: gateway-api-migration — NodePort + TLSRoute coexisting
  • Full TestTemplate suite passes with no regressions
  • Manual testing with Envoy Gateway — full produce/consume via rpk over TLS, SNI routing verified (recipe + results above)
  • Regenerate templates via task generate in CI

Closes #1361

🤖 Generated with Claude Code

david-yu and others added 13 commits April 14, 2026 17:13
Adds a new external access mode using Gateway API TLSRoute resources
with SNI-based routing, enabling per-listener domain configuration
and removing the need for NodePort/LoadBalancer services.

Design:
- User provides their own Gateway; the chart only manages TLSRoutes
- Bootstrap ClusterIP service + per-broker ClusterIP services created
  as TLSRoute backends
- Per-listener host/hostTemplate fields for SNI hostname configuration
- Each external listener gets a bootstrap TLSRoute and per-broker
  TLSRoutes with unique SNI hostnames
- Default advertised port is 443 (configurable via gateway.advertisedPort)
- NodePort/LoadBalancer service generation is skipped in gateway mode

Example values:
  external:
    enabled: true
    gateway:
      enabled: true
      parentRefs:
        - name: kafka-gateway
          sectionName: kafka
  listeners:
    kafka:
      external:
        default:
          port: 9094
          host: kafka.example.com
          hostTemplate: kafka-$POD_ORDINAL.example.com

Closes #1361

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…TLSRoute

- Transpile Go source to Helm templates via gotohelm
- Define lightweight TLSRoute mirror types compatible with gotohelm
  (upstream Gateway API uses type aliases the transpiler cannot handle)
- Register TLSRoute in the chart Scheme for test deserialization
- Add gateway-api-tlsroute test case to template-cases.txtar
- Regenerate all generated files (CRDs, RBAC, deepcopy, schemas)

Golden test output confirms correct resource generation:
- Bootstrap TLSRoute with SNI hostname pointing to bootstrap ClusterIP service
- Per-broker TLSRoutes with interpolated hostnames pointing to per-broker services
- ClusterIP services (bootstrap + per-pod) as TLSRoute backends
- NodePort/LoadBalancer services correctly skipped in gateway mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Changes gateway mode from a global all-or-nothing switch to a
per-listener opt-in. Each external listener can independently set
`gateway: true` to use TLSRoute mode while other listeners remain
on NodePort/LoadBalancer. This enables gradual migration:

1. Deploy Gateway, configure external.gateway with parentRefs
2. Add a new listener with gateway: true alongside existing ones
3. Migrate clients to the new TLSRoute-based listener
4. Remove the old NodePort/LoadBalancer listener

Key changes:
- Add Gateway *bool field to ExternalListener
- ServicePorts() skips gateway-enabled listeners (NodePort/LB)
- gatewayServicePorts() only includes gateway-enabled listeners
- TLSRoutes only created for listeners with gateway: true
- advertisedHostJSON checks per-listener gateway flag
- Remove global NodePort/LB suppression guards
- Add gateway-api-migration test case showing dual-mode operation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename example hostnames from kafka-*-broker to redpanda-*-broker
  in test cases and golden files
- Add changie changelog entry for the Gateway API TLSRoute feature

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All three CI failures had the same root cause: the chart's lightweight
TLSRoute type was registered in the chart's Scheme but not in the
operator's UnifiedScheme. This caused "no kind is registered for the
type redpanda.TLSRoute" errors in:
- TestFieldManagers / TestFieldManagersRegression (migration tests)
- TestControllerRBAC (controller tests)
- TestV2ResourceClient (lifecycle tests)
- kuttl (operator binary fails to start)

Fixes:
- Register TLSRoute in the operator's V2Scheme and UnifiedScheme
  via scheme.go init()
- Fix unparam lint: addTLSRouteToScheme no longer returns an
  always-nil error

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Revert protobuf files to match CI output (buf generate strips
  license headers that the license updater adds)
- Revert zz_generated_status.go import ordering to match CI
- Regenerate operator chart golden files to include the new
  gateway.networking.k8s.io/tlsroutes RBAC rule in all test cases

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add omitempty to GatewayParentRef.Name so the zero value is omitted
  during JSON serialization, matching the generated PartialGatewayParentRef
  which uses *string with omitempty. Fixes TestHelmValuesCompat fuzz test
  that detected a round-trip mismatch: CRD emitted {"name":""} but the
  partial expected {} for an empty GatewayParentRef.

- Fix gci import ordering in zz_generated.deprecations_test.go:
  github.com/redpanda-data/common-go belongs in the third-party group.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use direct type conversion TLSRouteParentRef(ref) instead of
field-by-field struct literal, as the linter's unconvert/gocritic
fix recommends (identical field sets allow direct conversion).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous commit simplified toTLSRouteParentRefs to use a direct
type conversion but did not regenerate the Helm template. The gotohelm
transpiler now emits a simpler template that just copies the ref
directly instead of field-by-field merging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes:

1. TestHelmValuesCompat: The fuzz test generates GatewayParentRef with
   Name=nil in the partial (*string), but the CRD type uses Name string
   (always serialized). Add a fixup callback to ensure Name is always
   non-nil in the fuzz input, matching the CRD's required field behavior.
   Revert the omitempty addition on both types since Name is required.

2. Golden files: After the toTLSRouteParentRefs type conversion change,
   the template no longer emits null fields (group, kind, namespace)
   for parent refs. Regenerated golden files to match.

Verified locally:
- TestHelmValuesCompat passes 10/10 runs
- TestTemplate (all cases) passes
- golangci-lint clean

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous k8s:generate run reverted the import fix. Reapplying:
github.com/redpanda-data/common-go belongs in the third-party group.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions Bot added the stale label Apr 23, 2026
@david-yu david-yu removed the stale label Apr 23, 2026
@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@david-yu
Copy link
Copy Markdown
Contributor Author

End-to-end test results (PASS)

Validated this PR locally on a k3d cluster (HEAD 075130b2). Full step-by-step recipe is now in the PR description under Verified end-to-end with Envoy Gateway + rpk.

Stack

  • k3d on rancher/k3s:v1.32.13-k3s1
  • Gateway API CRDs v1.2.1 (experimental channel — TLSRoute is in experimental)
  • Envoy Gateway v1.2.6 with mode: Passthrough on port 9094
  • cert-manager v1.17.2 issuing the chart's self-signed CA
  • charts/redpanda/chart from this branch, 1 broker, external.domain=test.local, listener kafka.external.default with gateway: true, host: redpanda.test.local, hostTemplate: redpanda-$POD_ORDINAL.test.local

What worked

Check Result
Chart renders 2 TLSRoutes (bootstrap + per-broker) rp-kafka-default-bootstrap, rp-kafka-default-0
TLSRoutes accepted by Envoy Gateway Accepted=True, ResolvedRefs=True on both; Gateway reports attachedRoutes: 2
Cert SANs cover bootstrap + per-broker hostnames *.test.local + test.local from external.domain covers both
rpk cluster info over TLS via bootstrap host ✅ Returns redpanda-0.test.local:9094 (gateway-aware advertised address)
rpk topic create / produce / consume round-trip ✅ 5 messages, partitions 0+2, all read back correctly
SNI-based routing to the right backend ✅ Confirmed in Envoy access log (table below)
requested_server_name=redpanda.test.local    → tlsroute/rp-gw/rp-kafka-default-bootstrap
requested_server_name=redpanda-0.test.local  → tlsroute/rp-gw/rp-kafka-default-0

The design works end-to-end: client connects to the bootstrap SNI hostname, Envoy passes TLS through to a broker, the broker advertises the per-broker hostname redpanda-0.test.local:9094, the client SNI-reconnects to that hostname, Envoy's SNI matcher selects the per-broker TLSRoute → the per-broker ClusterIP → broker pod. Redpanda's own cert is used throughout — no Gateway-side cert required for Passthrough.

Minor papercuts hit during the test

  • external.gateway and the per-listener gateway/host/hostTemplate fields are absent from charts/redpanda/chart/values.yaml. They're valid struct fields, but a documented stub in values.yaml would help discovery — happy to send a follow-up if useful.
  • Envoy Gateway's helm chart doesn't ship a default GatewayClass named eg; users have to create one. Worth a sentence in the docs section that lists Envoy as a known-good implementation.

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Different domain per listener

1 participant